code
share


Vital Elements of Calculus Series

All About Derivatives: Part 2

In the previous post we described in words and pictures what the derivative at a point is - in this post we get more formal and describe these ideas mathematically.

In [1]:
# imports from custom library
import sys
sys.path.append('../../')
from mlrefined_libraries import calculus_library as calclib
from mlrefined_libraries import basics_library as baslib

# import autograd
import autograd.numpy as np

1. How do we define the derivative driven tangent line?

We will start our discussion with functions that take in only one input like the familiar sinusoidal function

\begin{equation} g(w) = \text{sin}(w) \end{equation}

which takes in the single input $w$ (we generalize afterwards to functions that take in more than one input).

Remember what we said in words / pictures previously about the derivative of a function at a point: the derivative at a point defines a line that is always tangent to a function, encodes its steepness at that point, and generally matches the underlying function near the point locally. In other words: the derivative at a point is the slope of the tangent line there.

The derivative at a point is the slope of the tangent line at that point.

How can we more formally describe such a tangent line and derivative?

1.1 Secant lines

In the image below we show a picture of the sinusoid in the left panel, where we have plugged the input point $w_0 = 0$ into the sinusoid and highlighted <font color = #32cd32> the corresponding point $(0, \text{sin}(0))$ in green </font>. In the middle panel we plot another point on the curve - with input $w_1 = -2.6$ <font color = 'blue' > the point $(-2.6, \text{sin}(-2.6) ) $ in blue </font>, and <font color = 'red'> the secant line in red </font> formed by connecting <font color = 'blue'> $(-2.6, \text{sin}(-2.6) ) $ </font> and <font color = #32cd32> $(0, \text{sin}(0))$ </font>. Finally in the right panel we show <font color = #32cd32> the tangent line at $w = 0$ in lime green. </font> The <font color = 'gray' > gray vertical dashed lines </font> in the middle panel are there for visualization purposes only.

A secant line is just a line formed by taking any two points on a function - like our sinusoid - and connecting them with a straight line. On the other hand, while a tangent line can cross through several points of a function it is explicitly defined using only a single point. So in short - a secant line is defined by two points, a tangent line by just one.

The equation of any secant line is easy to derive - since all we need is the slope and any point on the line to define it - and the slope of a line can be found using any two points on it (like the two points we used to define the secant to begin with.

The slope - the line's 'steepness' or 'rise over run' - is the ratio of change in output $g(w)$ over the change in input $w$. If we used two generic inputs $w_0$ and $w_1$ - above we chose $w_0 = 0$ and $w_1 = -2.6$ - we can write out the slope of a secant line generally as

\begin{equation} \text{slope of a secant line} = \frac{g(w_1) - g(w_0)}{w_1 - w_0} \end{equation}

Now using the point-slope form of a line we can directly write out the equation of a secant using the slope above and either of the two points we used to define the secant to begin with - using $(w_0, g(w_0))$ we then have the equation of a secant line $h(w)$ is

\begin{equation} h(w) = g(w_0) + \frac{g(w_1) - g(w_0)}{w_1 - w_0}(w - w_0) \end{equation}

If we think about our <font color = #32cd32> green point </font> at $w_0 = 0$ as fixed, then the tangent line at this point can be thought of as the line we get when we shift the <font color = 'blue' > blue point </font> very close - infinitely close actually - to the green one.

Example. Secant line computation

Taking $w_0 = 0$ and $w_1 = -2.6$ the equation of the secant line connecting $(w_0,\text{sin}(w_0))$ and $(w_1,\text{sin}(w_1))$ on the sinusoid is given as

$$h(w) = \text{sin}(0) + \frac{\text{sin}(-2.6) - \text{sin}(0)}{-2.6 - 0}(w - 0)$$

Since $\text{sin}(0) = 0$ and $\text{sin}(-2.6) \approx -0.5155$ we can write this as

$$h(w) = \frac{0.5155}{2.6}w$$

1.2 From secant to tangent line

The next Python cell activates a slider-based animation widget that illustrates precisely this idea. As you shift the slider from left to right the <font color = 'blue'> blue point </font> - along with the <font color = 'red'> red secant line </font> that passes through it and the <font color = #32cd32 > green point </font> - moves closer and closer to our fixed point. Finally - when the two points lie right on top of each other - the <font color = 'red'> secant line </font> becomes the <font color = #32cd32> green tangent line </font> at our fixed point.

In [2]:
# what function should we play with?  Defined in the next line, along with our fixed point where we show tangency.
g = lambda w: np.sin(w)

# create an instance of the visualizer with this function
st = calclib.secant_to_tangent.visualizer(g = g)

# run the visualizer for our chosen input function and initial point
st.draw_it(w_init = 0, num_frames = 200)
Out[2]:



In sliding back and forth, notice how it does not matter if we start from the left of our fixed point and move right towards it, or start to the right of the fixed point and move left towards it: either way the secant line gradually becomes tangent to the curve at $w_0 = 0$. There is no big 'jump' in the slope of the line if we wiggle the slider ever so slightly to the left or right of the fixed point - the slopes of the nearby secant lines are very very similar to that of the tangent.

When we can do this - come at a fixed point from either the left or the right and the secant line becomes tangent smoothly from either direction with no jump in the value of the slope - we say that a function has a derivative at this point, or likewise say that it is differentiable at the point.

If the slope of the secant line varies gradually - with no visible jumps - from both the left and right of a fixed point on a function, we say that a function has a derivative at this point, or likewise say that it is differentiable at the point. A function that has a derivative at every point is called differentiable.

Example. The hyperbolic tangent, squared

Many functions like our sinusoid, other trigonometric functions, and polynomials are differentiable at every point - or just differentiable for short. You can tinker around with the previous Python cell - pick another fixed point! - and see this for yourself. You can also tinker around with the function - for example in the next cell we show - using the same slider mechanism - that the function

\begin{equation} g(w) = \text{tanh}(w)^2 \end{equation}

has a derivative at the point $w_0$ = 1.

In [3]:
# what function should we play with?  Defined in the next line, along with our fixed point where we show tangency.
g = lambda w: np.tanh(w)**2

# create an instance of the visualizer with this function
st = calclib.secant_to_tangent.visualizer(g = g)

# run the visualizer for our chosen input function and initial point
st.draw_it(w_init = 1, num_frames = 300)
Out[3]:



Example. An example of failure: the rectified linear unit

Notice: that the slope of the secant line must smoothly change to the slope of the tangent line from both directions - from both the left and right - is important to this definition. There are plenty of functions where this does not occur at every point, like the function

\begin{equation} g(w) = \text{max}(0,w) \end{equation}

at the point $w_0 = 0$. This function is called a rectified linear unit or relu for short. Using the slider widget we can see that the slope of the secant line visibly jumps at this point. Move the slider back and forth around where $w = 0$ and watch the slope of the secant jump distinctly from zero to one. Because the slopes of the secant lines just to the left and right of the fixed point $w_0 = 0$ fail to line up, the function does not have a derivative here. So try as you might, the line will never turn green.

In [4]:
# what function should we play with?  Defined in the next line, along with our fixed point where we show tangency.
g = lambda w: np.maximum(w,0)

# create an instance of the visualizer with this function
st = calclib.secant_to_tangent.visualizer(g = g)

# run the visualizer for our chosen input function and initial point
st.draw_it(w_init = 0, num_frames = 200,mark_tangent = False)
Out[4]:



1.3 From secant slope to derivative

With this in mind how can we compute the equation of a tangent line at some point $w_0$ for a given function? More specifically, how can we compute the derivative here - or the slope of this tangent line? Well we know that if we take another point $w_1$ on either side of $w_0$ and connect the two - creating the secant line with equation

\begin{equation} h(w) = g(w_0) + \frac{g(w_1) - g(w_0)}{w_1 - w_0}(w - w_0) \end{equation}

that as we push $w_1$ ever closer towards $w_0$ that this secant becomes our tangent line when $w_1 \approx w_0$. Now note $w_1$ appears only in the slope of this equation, hence the slope of this line is the only quantity that changes as $w_1$ gets closer to $w_0$ and the secant line becomes tangent at $w_0$. This is great because now in our aim to understand the tangent line we can focus our attention solely on what is happening with the slope of the secant - which is precisely the derivative (the slope of the tangent line) that we are after.

Now, remember that the slope of a line measures its slope, or 'rise over run' which is the change in its vertical value ($g(w_1) - g(w_0)$) over the change in its horizontal value ($w_1 - w_0$). In other words

\begin{equation} \text{slope of secant line} = \frac{\text{change in $g$}}{\text{change in $w$}} = \frac{g(w_1) - g(w_0)}{w_1 - w_0} \end{equation}

As $w_1$ inches ever closer to $w_0$ - from either the left or the right of $w_0$ - the change in both $g$ and $w$ becomes incredibly small or infinitesimal. And this is how the derivative is conceptually defined: as the slope of a secant line where $w_1$ is so close to $w_0$ that the change in $g$ and $w$ are both infinitesimal. And remember: the value of this slope needs to be the same whether or not $w_1$ lies to the left or right of $w_0$.

The derivative of a function $g$ at a point $w_0$ is the slope of the tangent line there, which in turn is the slope of a secant line where $w_1$ is so close to $w_0$ that the both the change in $g$ and $w$ defining the slope of the tangent are infinitesimal small.

Refining the definition of the derivative

Lets quantify more explicitly using math notation what this definition means, first by backing off the 'infinitesimally small' part for a moment - lets just make the difference very small. We can define a generic point very close to and to the right of $w_0$ by denoting by $\epsilon$ some small positive number (e.g., $\epsilon = 0.0001$), then the point $w_1 = w_0 + \epsilon$ is indeed quite close to $w_0$. Following, then the slope of the secant line connecting $(w_0,g(w_0))$ to $(w_0 + \epsilon, g(w_0 + \epsilon))$ is given as

\begin{equation} \frac{g(w_1) - g(w_0)}{w_1 - w_0} = \frac{g(w_0 + \epsilon) - g(w_0)}{w_0 + \epsilon - w_0} = \frac{g(w_0 + \epsilon) - g(w_0)}{\epsilon} \end{equation}

To ensure that this value is indeed close to the derivative value we need to check that the slope of this secant line is very similar to the slope of a secant based at $w_0$ and going through a point slightly to the left of $w_0$. Taking the same value for $\epsilon$ we can take the point $w_0 - \epsilon$ which lies just to the left of $w_0$. Forming the secant connecting points $(w_0, g(w_0))$ and $(w_0 - \epsilon, g(w_0 - \epsilon))$ we can compute its slope as

\begin{equation} \frac{g(w_1) - g(w_0)}{w_1 - w_0} = \frac{g(w_0 - \epsilon) - g(w_0)}{w_0 - \epsilon - w_0} = - \frac{g(w_0 - \epsilon) - g(w_0)}{\epsilon} \end{equation}

If there is indeed a derivative at $w_0$ then the value of this slope needs to closely match the slope of our first secant, or in other words

\begin{equation} \frac{g(w_0 + \epsilon) - g(w_0)}{\epsilon} \approx - \frac{g(w_0 - \epsilon) - g(w_0)}{\epsilon} \end{equation}

And - moreover - as we make $\epsilon$ smaller and smaller these two quantities should both settle down to one value, and be perfectly equal to each other.

Notice that we can express this more compactly if we let $\epsilon$ represent a small (in magnitude) positive or negative number. Then we can say equivalently that we desire that the quantity

\begin{equation} \frac{g(w_0 + \epsilon) - g(w_0)}{\epsilon} \end{equation}

to settle down as we make $\epsilon$ smaller and smaller in magnitude. We can still think about this more compact formula as representing the slope of secant lines on either side of $w_0$, getting ever closer on both sides to $w_0$ we make the magnitude of $\epsilon$ infinitesimally small.

Writing this algebraically we say that we want the value $ \frac{g(w_0 + \epsilon) - g(w_0)}{\epsilon} $ to converge to a single value as $\vert\epsilon\vert \longrightarrow 0$.

Common notations for the derivative

One common notation used to denote this ratio of infinitesimal changes $\frac{\text{infinitesimal change in $g$}}{\text{infinitesimal change in $w$}}$ is $\frac{\mathrm{d}g}{\mathrm{d}w}$. Here the symbol $\mathrm{d}$ means 'infinitely small change in the value of'. A common variation on this notation puts the $g$ out front, like this $ \frac{\mathrm{d}}{\mathrm{d}w}g$. In short - we have both the definition and symbol to denote a general derivative of $g$ at any point as

\begin{equation} \text{derivative} = \frac{\text{infinitesimal change in $g$}}{\text{infinitesimal change in $w$}}:= \frac{\mathrm{d}g}{\mathrm{d}w} \,\,\, \text{or} \,\,\, \frac{\mathrm{d}}{\mathrm{d}w}g \end{equation}

There are other notations commonly used in practice to denote the derivative, but we will stick to using these.

To denote the derivative at a specific point $w_0$ we will write

\begin{equation} \frac{\mathrm{d}}{\mathrm{d}w}g(w_0) \end{equation}

Example. Computing approximate derivatives at a point

Take our sinusoid, the point $w_0 = 0$, and a small magnitude value for $\epsilon$ like $\epsilon = 0.0001$. Computing the slope of a secant line where $w_1 = w_0 + \epsilon$ lies just to the right of $w_0$ we have

$$ \frac{g(w_0 + \epsilon) - g(w_0)}{\epsilon} = \frac{\text{sin}(0.0001)}{0.0001}\approx 0.99999$$

Likewise computing the slope of the secant line where $w_1 = w_0 - \epsilon$ lies just to the left of $w_0$ we have

$$ -\frac{g(w_0 - \epsilon) - g(w_0)}{\epsilon} = -\frac{\text{sin}(-0.0001)}{0.0001}\approx 0.99999$$

Indeed both slopes are approximately equal, so we can definitively say at $w_0 = 0$

$$ \frac{\mathrm{d}}{\mathrm{d}w}g(w_0) \approx 0.99999 $$

Using this we can write out the equation of the tangent line to the sinusoid at $w_0 = 0$ as

$$ h(w) = \text{sin}(0) + 0.9999(w - 0) = 0.9999w$$

Example. Checking non-differentiability at $w = 0$ for the relu function

Checking differentiability of the relu function

$$ g(w) = \text{max}(0,w) $$

at $w_0 = 0$ we have that the slope of a secant where $w_1 = w_0 + \epsilon$ for any small $\epsilon > 0$ (e.g.,$\epsilon = 0.0001$) coming from the right

$$ \frac{g(w_0 + \epsilon) - g(w_0)}{\epsilon} = \frac{\text{max}(0,0.0001)}{0.0001}= \frac{0.0001}{0.0001} = 1$$

A similar computation where $w_1 = w_0 - \epsilon$ comes in from the left gives

$$ -\frac{g(w_0 - \epsilon) - g(w_0)}{\epsilon} = -\frac{\text{max}(0,-0.0001)}{0.0001}= -\frac{0}{0.0001} = 0$$

Since these two secant slopes do not match up, the function is not differentiable at $w_0 = 0$, and these computations hold regardless of the magnitude of $\epsilon$.

Up next

In future posts in our 'All About Derivatives' series we continue our discussion of derivatives of single input functions discussing topics including derivative formulae and automatic differentiation, and then discuss how to extend all of these ideas to functions taking in multiple inputs.

The content of this notebook is supplementary material for the textbook Machine Learning Refined (Cambridge University Press, 2016). Visit http://mlrefined.com for free chapter downloads and tutorials, and our Amazon site for details regarding a hard copy of the text.